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Abstract 

We formulate the RNA folding problem as an x matrix field theory. 
This matrix formalism allows us to give a systematic classification of the 
terms in the partition function according to their topological character. The 
theory is set up in such a way that the limit N ^ oo yields the so-called 
secondary structure (Hartree theory). Tertiary structure and pseudo- knots 
are obtained by calculating the 1/A^^ corrections to the partition function. 
We propose a generalization of the Hartree recursion relation to generate the 
tertiary structure. 
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I. INTRODUCTION 



Over the last decade, RNA has transformed itself from being a relatively minor player 
in the central dogma of Watson and Crick to being one of the central players in molecu- 
lar biology. Indeed, it has been recently demonstrated that in addition to its "information 
carrier" role in protein synthesis, some types of RNA's, known as ribozymes, have an enzy- 
matic activity which is crucial to the functioning of the cell [|T|. As a consequence of this 
new prominent role of RNA, the search for the three dimensional structure of RNA has 
become an important problem in biology. This view was expressed forcefully by Tinoco and 
Bustamante 

As this paper is addressed to theoretical physicists, we begin with a schematic review. 
A very thorough review on RNA folding can be found in ref. 0. 

RNA is a heteropolymer constructed out of a four-letter alphabet, C, G, A, and U (for 
the four bases or nucleotides cytosine, guanine, adenine, and uracil). The length of an RNA 
chain ranges typically from 76 for tRNA to a few thousand base pairs for mRNA. In solution, 
there is an attraction between C and G and between A and U, with energies e{C, G) ^ —3 
kCal/mole and e{A,U) ~ —2 kCal/mole respectively. There is also a weaker attraction 
between G and U, with energy e{G, f/) ~ — 1 kCal/mole. Note the correspondence 300 K ~ 
0.6 kCal /mole ~ 1/40 eV. 

Consider an RNA sequence {s} = {si, S2, ■ ■ ■ , sl} (where Si takes on one of the 
four possible values G, G, A, and U). For example, we might be given the sequence 
{s} = {GGGGAAAUUGGUAG ■ ■ ■}. The attraction between the nucleotides folds the 
RNA heteropolymer into a 3— dimensional structure referred to as a shape. Biological func- 
tions depend largely on the shape assumed by a particular RNA. Thus, the map from 
sequence space to shape space is of great importance in molecular biology and has been 
much discussed in the biophysical literature. As mentioned above, this has been even more 
true since the discovery of the enzymatic activity of some RNA. 

In the molecular biology of biopolymers, it is conventional to define three levels of struc- 
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tures. The primary structure is just the chemical sequence, or sequence of nucleotides. The 
secondary structure is the local short-range pairing of complementary bases, leading to seg- 
ments of helices separated by loops and bulges ("clover-leaf" structure). Finally, the tertiary 
structure is the spatial arrangement of these secondary motifs, in which the loops and bulges 
themselves can partially pair, leading to the so-called pseudo-knots (see fig. la). 




Fig. la: Secondary (left) and tertiary (right) structure of a tRNA.(From I. Tinoco, with 
permission.) 

An example of pseudo knot is the "kissing hairpin" Fig. lb. 




Fig. lb: A "kissing hairpin" . 



In contrast to the problem of protein folding , RNA folding is hierarchical in that its 
secondary structure is much more stable than its tertiary structure, which can be treated as 
a perturbation [Q. Experimentally, the two levels of folding (secondary and tertiary) can be 
separated by varying the concentration of Mg"^+ ions In addition, the attractive force 
between nucleotides saturates. Once a given nucleotide C has paired with a nucleotide G, 
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it cannot be paired with yet another G. In contrast, the attraction between amino acids do 
not saturate. Thus, the problem of RNA folding is considerably simpler than the problem 
of protein folding. 

The determination of secondary structure has reached a very high level of sophistication 
based on dynamic programming algorithms 0-0]. 

The problem of RNA folding is clearly topological in flavor and is thus not easily amenable 
to dynamic programming methods, although some algorithm has been proposed recently [llO| . 
On the other hand, we know from the field theoretic literature that topological considerations 
also play an important role in such subjects as matrix theory or M— theory. In this paper, 
we propose that matrix theory may be useful to the problem of RNA folding. We develop 
a matrix theoretic representation of the topological aspect of RNA folding. 

In section I, we formulate the RNA folding problem more precisely. In section II, we 
show how it can be formulated as an x matrix field theory. In section III, we show 
that the A^ dependence of the field theory can be made explicit in the functional integral 
formulation of the problem. As a result, the natural way to compute the 1/A^ expansion is 
through a steepest descent method which is described in section IV. As this expansion is 
very complicated to perform at higher order, we resort in section V to recursion relations 
which allow us to approximately incorporate the higher order powers in 1/A^. 

For a simple introduction to this work, one can go for instance to the website 
[http : / / online .itp.ucsb.edu/ online / infobioO 1 / zee / 

II. RNA FOLDING 

Given an RNA sequence {s} = {si, S2, ■ ■ ■ , sl} of L bases, let us write down the partition 
function Z at temperature 1/13. We will proceed in steps. 
First, construct the matrix 

= e-ms^,sm\n-n\)Q^^^_^^ > 4),^ ^ j; Vu = 0. (1) 
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where e(a, b) denotes the 4 by 4 real symmetric matrix giving the attractive energy between 
nucleotides, e{A, U) etc. We set the diagonal elements Va to to indicate the fact that a 
nucleotide does not attract itself. The Heaviside function d{\i~j\ > 4) incorporates the fact 
that the RNA molecule is not infinitely flexible and we cannot pair nucleotides separated 
by less than 4 sites. The attractive potential can be taken to be v{r) = —w9{R — r) with w 
and R the strength and range of the attraction respectively. 
Now construct 

^ = 1 + E + E ^^i^ki + ■■■+ E ^ikV.i + ■ ■ ■ (2) 

<ij> <ijkl> <ijkl> 

where < ij > denotes all pairs with j > i, < ijkl > all quadruplets with I > k > j > i, and 
so on. Then the partition function is given by 

Z = f f[d'rkl[fi\n+i-n\) Z (3) 

k=l i=l 

The function /(r) can be taken to be, for example, 5{r — I) for a model in which the 
nucleotides are connected along the RNA heteropolymer by rigid rods of length /, or 
g-(»'-0^/6o-^fQj, a model with elastic rods. Note that the saturation of the hydrogen bond 
has been incorporated by the requirement I > k > j > i, and so on. Once the nucleotide at 
i has interacted with the nucleotide at j it cannot interact with the nucleotide at k . Note 
that in (|^), only the enthalpy and combinatorics of pairings are included. The integration 
over the atomic coordinates in (|) accounts for the actual topological feasibility of a given 
pairing and also for the entropic factor associated with loop formation. 

Biologists are interested in the folded configuration essentially at room temperature. 
Since room temperature is substantially less than the melting temperature (of order 80°C, 
in other words, the characteristic energy scale of the problem), we want to determine the 
ground state configuration of the RNA heteropolymer. In other words, once we have obtained 
Z we would like to extract the term in Z that dominates as l3e tends to infinity in (|1]). 

We have given a simplified quantitative framework for the RNA folding problem. From 
a chemical point of view, it would be appropriate to include also the stacking energies of 
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couples of complementary base pairs, instead of energies of single pairs of bases. However, 
in the following, we will stick with the latter. We will also concentrate on the evaluation 
of the "pairing" partition function (^. We expect that the various effects we have ignored, 
such as stacking , etc., can be added later as "bells and whistles" to our approach. The 
stacking energies for instance can be taken into account by utilizing a 16 x 16 interaction 
matrix between pairs of bases instead of the 4x4 matrix e{si, Sj) we use here. 

III. MATRIX THEORY 

What is the connection with matrix theory? 

Consider pulling apart the folded RNA structure given in fig. 2a. 




Fig. 2a: Representation of the secondary structure of an RNA. 

We obtain the structure of fig. 2b which to physicists are reminiscent of Feynman dia- 
grams in a variety of subjects: matrix theory, quantum chromodynamics, and so on. 




Fig. 2b: Representation of the same RNA stretched. 



For the sake of definiteness, let us borrow the terminology of quantum chromodynamics. 
The dotted lines are known as gluon propagators, and the solid line as a quark propagator. 
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The secondary structure corresponds to diagrams in which the gluon hues do not cross over 
each other, while the tertiary structure corresponds to diagrams in which the gluon lines do 
cross. 

The crucial observation, originally made by 't Hooft |Tl[], is that there is a systematic 



relation between the topology of a graph and its corresponding power of l/N"^. For instance, 
planar diagrams are of order 1/A^°, and diagrams in which gluon lines cross are of higher 
order. We merely have to go to the large N expansion, and the diagrams are classified by 
powers of l/N"^. Note that a somewhat similar formulation in terms of matrix theory has 



been used for the meander problem 
Consider the quantity 

Here (yjj (i = 1, ■ ■ ■ , L) denote L independent by Hermitian matrices and 11/(1 + ipi) 
represents the ordered matrix product (1 + {pi){\ + '^2) • ■ ■ (1 + All matrix products will 
be understood as ordered in this paper. The normalization factor A{L) is defined by 

A{L) = / n c/y^fce-^^-^''")'^*''^'^'^^) (5) 
k=i 

Let us refer to the row and column indices a and h of the matrices (v5j)a color indices, 
with a,b = 1,2, ■ ■ ■ , N. The matrix integral (H) defines a matrix theory with L matrices. We 
can either think of it as a Gaussian theory with a complicated observable -^trn;(l + ipi), or 
alternatively, by raising j^tTlli{l + (^1) = e^°^['^*'''^'*^^+'^')l into the exponent, as a complicated 
matrix theory with the action (YJ2ijiV~^)ij^^{Vi'^j) — ^og[j^trIli{l + v^/)]). Another trivial 
remark is that we can effectively remove -^tr from (^). 

The important remark is that the matrix theory [|I^ defined by has the same topolog- 
ical structure as 't Hooft 's large N quantum chromodynamics. There are L types of gluons, 
and the gluon propagators are given by j^Vij. As in large quantum chromodynamics, each 
gluon propagator is associated with a factor of and each color loop is associated with a 
factor of A^. The reader familiar with matrix theory or large A^ quantum chromodynamics 



can see immediately that the Gaussian matrix integral (Q) evaluates precisely to the infinite 
series 

Z(1,L) = 1+ E V,,Vu + --- + ^^ E V,kV,i + --- (6) 

<ii> <ijkl> <ijkl> 

Some "typical" terms in this series correspond to the diagrams in fig. 3. 



Fig. 3: Graphical representation of a few terms of the partition function. 

This differs from (Q) only in that the terms with different topological character are now 
classified by inverse powers of Thus, the use of the large expansion allows us to sepa- 
rate out the tertiary structure, represented in (||) for example by the term -^J^KijkiyVikVji, 
from the secondary structure. 

Note that the ordered product + ipi) ensures that the diagonal elements Vu of the 
matrix V do not appear in Z{1, L). We have nevertheless already set Va to 0. 

The program proposed in this paper is thus to evaluate Z{1,L) with V an arbitrary 
matrix. Once Z{1, L) is known we can then insert it into (|^) to evaluate Z. The parameter 

serves as a convenient marker to distinguish the tertiary structure from the secondary 
structure. What we offer here is a systematic way of generating refinements to the calculation 
of Z, and hence the free energy F, to any desired accuracy in a well controlled approximation. 

Since in Z{1, L) the quantities 1 and L represent arbitrary labels we can just as well 
define 

Z(m,n) = — ^ / nrf^fce-^^".-'"(^"')-*'-('^'^^)ltrn(l + ^^ (7) 
where again the normalization is given by 

k=m 

As we shall see in the following, we will construct recursion relations to evaluate (|^ 
approximately. These recursion relations can be easily programmed to calculate the free 
energy of the RNA chain. 
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IV. LARGE N 



In the matrix representation appears implicitly in the size of the matrices <^j- In 

order to study the large limit, we need to extract the dependence explicitly, for which 
we have developed the following method. Define Gi>i by Giq = + ^Pi) for /' — !>/, 

Gi^k,i = for all k > 0, and Gu = 1. Then Gin satisfies the equation 

Gm - (1 + ipi'-i)Gi>^i,i = 6in (8) 

Thus, if we define Miq = 6i'i — (1 + v^«'-i)^r-i,i then we see that Gw is the inverse of the 
matrix Mw and thus 

We have used the standard representation of the inverse of a matrix by an integral 
over Grassmanian fermionic variables ipi and ipi . Note the felicitous fact that detM = 
/ d'il)*dipe~^ = 1 which allows us to write without a denominator. 

To compactify this representation of Z further we introduce M{h)ij = Mij + h6i^i6j^L+i 
and write 



NdhA{L) 

Henceforth, it is understood that after differentiation with respect to h we set h to 0. 
We can now perform the Gaussian integration over (f^, thus obtaining 

L) = ^^l #*#e-^°('^*'^)-^^('^*'^) (12) 

with the free fermion action 

Soir,^) = E(v^; - + ^^iV^+i (13) 

and the interacting fermion action 



2N 



Note that in (|T4|) we have displayed the color indices a and h explicitly. 
We next rewrite 

id' 

in terms of the color singlet variable 

a 

Now use the Gaussian representation 



(14) 



(15) 



(16) 



(17) 



with the normalization factor C = J dAe'^^"^^^ . Note that even though K is complex we can 
take A to be hermitean. ( Equivalent ly, the anti-hermitean part of A drops out.) Putting it 
together we obtain 



j dip*dip 



where 



(18) 



(19) 



or in matrix form 



Mr 



1 

-11 + ai2 ciis 

«12 -1 1 + «23 



h 
Oil 
a2L 



-1 l + ai-iL 



-1 



(20) 
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where we have used the convenient notation 



i\JVij Aij = aij for i < j 

Aj = a*, for j < i (21) 

The point of these manipulations is that in (|l^) we have now isolated the color index a 
so that the integral over ip* and ip factors into copies of the same integral, thus giving 

Z{l,L) = / dAe-T-^\detM{A)f = / dAe-^-^'^^-^o,MiA, (22) 

At this point, we can differentiate with respect to h and set h to 0, obtaining the alternative 
form 

Z{1, L) = ^J rfAe-Tt'^^'+^t^i°g^(^)M-^(A),.+i,i (23) 

In this expression, 

= Sij - 5,j+i + i{Vi.i^j)^Ai_ij (24) 

Let us introduce the action 

S{A) = -tiA^ - tr log M{A) (25) 
2 

and define the average of an "observable" O by 

< O >= ^ / dAe-^^(^)0 (26) 

(Note the non-standard normalization used here.) Then, our result can be summarized 
elegantly as 

Z{1,L)=<M-\A)l+i,i> (27) 

At this point, as remarked earlier, we note that the quantity Z{1,L) can obviously be 
generalized to Z{i,j): after all, the site labels 1 and L are arbitrary. Then we have the 
appealing result that 
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Z{t,j) =< M-\A)j+i, > for j > i (28) 
It is also useful to introduce the free action 

SoiA) = hiA' (29) 

and to define 

<0>o=^J dAe-^^°(^)0. (30) 
Then we can also write our result as 

Z{l,L) = ^-^<{detM{A)f>o (31) 

Remarkably, it turns out that we will need both the representations (pT]) and (0) later 
in a single calculation. 

Incidentally, our formulation of the RNA folding problem can be immediately adapted 
to the marriage problem (or bipartite matching problem) |1^ ||I5[ ||T6[ ||T^, one of the 
classic problems in combinatorial optimization. We will mention only the simplest version 
here. Label L (with L even) individuals by the index i = 1, ■ ■ ■ ,L where the individual 
is male for i odd and female for i even. Define a matrix Vij = |(1 — (— 1 )*'*'■' )e~^^'^ where 
Eij represents the energy cost of a marriage between i and j and its negative provides a 
measure of happiness. Referring back to @ we see that we want to extract in Z{1,L) 
all the terms with L/2 powers of V, for example the term V14V38 ■ ■ ■ Vl_i^2 = 6"^^^°'"' 
with Exotai = -E'14 + -f'ss ■ ■ Siucc we now want to include possible crossings in the 

Feynman diagram language we can set the number of colors to 1. Thus, from (p2D, we 
have immediately Z{l,L) = J dAe~ 2^'^' det M (A). Referring to (0) we see that the 
differentiation with respect to h and setting h to amounts to replacing the L + 1 by L + 1 
matrix M{A) by the L by L matrix obtained by deleting the first row and last column. 
Furthermore, since we want the terms with L/2 powers of V, that is, with L powers of , 
we can set the I's and — I's in this matrix to 0. Denoting the resulting matrix by Ai{A), 
we obtain the following representation for the marriage problem 
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Z^{L) = rfAe-^f^' det MiA) (32) 

Clearly, the representation given here can be generalized in a number of directions, for 
example, by including individuals who remain single. 

It is easy to see how this representation works: the Gaussian integration insures that in 
det A^(y4) only the appropriate terms are picked out. 

V. STEEPEST DESCENT 

The fact that we have been able to display explicitly the dependence is crucial and 
allows us in principle to carry the expansion to any order. The standard strategy to 
evaluate integrals such as ( p3D is of course to use the method of steepest descent ( [0, [pJ|]). 

To leading order the steepest descent approximation is easy enough to carry out. The 
stationary point is determined by = 0, that is 

Aij,=tiVik)-^Gi.i,k+i (33) 

where we find it useful to define 

G., = (M-i),+i,, (34) 

Notice that with this definition is defined for i from to L — 1 and for j from 2 to L + 1. 
The identity J2j Mij{M~^)jk = ^ik can now be written as 

Gi+i^k " Gik — ^ Vi+ijGij+iGj-i^k = ^i+2,k (35) 



Referring to (^) and (|28| ) we see that to leading order in steepest descent, Z{i,j) is just 
M~^{A)j+i^i = Gji evaluated at the stationary point. 

This equation ( pSf ) has already been written down in the literature pO|,|7|-P,pll-p5| and is 
known as the "Hartree approximation". It has the obvious interpretation (see fig. 4) that 
to lowest order the additive effect of including one extra nucleotide labelled by L + 1 to the 
RNA heteropolymer can be described by pairing that nucleotide to the nucleotide labeled 
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by j, which separates the heteropolymer into two segments, one from 1 to j and the other 
from J ' + 1 to L + 1. We then sum over all possible j of course. 



i+1 k i i+1 k 



i+1 



Fig. 4: Graphical representation of the Hartree recursion relation. The thick line represents 
the propagator G. 

In principle, steepest descent gives a systematic expansion of Z{1,L) to any desired 
power of jj: by expanding the exponent and the observable around the saddle-point. In 
the present context, this implies that the full three dimensional structure of the RNA can 
be obtained by expanding around the secondary structure. In particular, the higher order 
terms do not disrupt the secondary structure, but merely add new interactions, in addition 
to the existing secondary pairing. This is in marked contrast with protein folding, where it 
is known that there is a strong correlation between secondary and tertiary structure. 

In practice, however, it proves to be quite tedious to calculate the terms explicitly. 
In the integral in (p3D we are now to replace Aij by Aij + Xij/ \/N where Aij is determined 
by ( p3|) and (|35|) . A straightforward calculation shows that 

Z{l,L)=Jdxe^pi--tTx'--tT{M-'c) -E^^^tr(M-^c) 



X 



p=3- 

P=l / J L+1,1 



where is related to G through equation (0), and cw = ^Vi^i^i> . The systematic 

corrections to Z are obtained by expanding ( p6D in powers of By symmetry, no 

half-integer powers of N remain in the expansion of Z. 

The first thing to evaluate is the propagator of the fluctuation fields x^. This is just 
the inverse of the kernel of the quadratic form appearing in the exponent of (PU]). This 
propagator Aij^ki is in fact a scattering amplitude and satisfies a form of the Bethe-Salpeter 



equation | 
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^kLmn 



tj,mn 



(37) 



where G are the Hartree propagators ([35|). In fig. 5, we show a graphical representation of 
this recursion equation, as well as the series of graphs it resums. It is clear that this equation 
resums all the possible ladder (or rainbow) diagrams to this order. 
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Fig. 5: Graphical representation of the Bethe-Salpeter recursion relation. The dotted lines 
represent factors ^JVij while the dashed lines represent factors Vij. The solid thick lines 
represent Hartree propagators G. The Hartree propagators being directed, the arrows denote 
the direction of increasing spatial index. 

This equation is to be solved for the particular sequence studied. The scattering ampli- 
tude A defines the contractions of the x fields, and thus its knowledge allows us in principle 
to calculate (^) to any order. Note that as usual in field theory, only contractions which 
are linked to the operator that we calculate are to be included. (This reduces considerably 
the number of contraction). 

A fairly simple calculation allows us to show that the correction vanishes identically 
(see appendix A). This result appears true by drawing a few graphs, but this gives an 
algebraic proof. 

It is easy to see that we have to expand (|36D to O(x^) in order to calculate the free energy 
to order The calculation, although cumbersome, is straightforward. The free energy 
reads 

Z{l,L)=GiL 
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+ At < I (--BiTr, + —BiT^Ti - —BiT'^ 
m I V 5 ^ 12 ^ ^ ^ 162 ^ ^ 

- + ^B^T^ - \b;T^ + B^M'^\ > (38) 

4 lo d J J L+1,1 

where we have used the notation 

Dmn = X! ^mrn'\/^m'-l,n Xm'-l,n 
m' 

(Bp),, = mu 

Tp = TiBp (39) 

In (p8D, the bracket means that the Wick theorem should be apphed to contract the 
fields xiii which appear in this expression, their contraction being given by the kernel A. 

The calculation of the correction to the free energy is possible numerically for not too 
long RNA sequences. Work in this direction is in progress. 

Because of the complexity of the (exact) order 1/A^^ obtained in this approach, we 
found it simpler to generalize the Hartree recursion equation to incorporate some residual 
interactions between the loops and bulges. 



VI. RECURSION APPROACH 

Two approaches can be used to derive recursion relations for the partition functions. 
One is detailed in the following, whereas the other one is described in appendix B. 
A possible approach is to take the expression in ( pT] ) 

Z{l,L) = j^-^<{detM{A)f>o (40) 

and try to relate Z{1,L + 1) to Z{1,L). In other words, we would like to relate 
< {det Ml+i{A))^ > to < {det Ml{A))^ > where the subscript on M keeps track of the 
different matrices in the discussion. Note that is an L + 1 by L + 1 matrix. Explicitly, 
as noted before, the L + 2 by L + 2 matrix M^+i has the form 
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L+l 



1 

-11 + ai2 023 

al2 -1 



h 
bi 
62 



h* h* 



where for convenience we have denoted 



^ A / ^ij ^ij 



i\l Vij 



L+l,j ^L+lj 



-1 1 + 6l 

K-i K -1 1 

aij for i < j < L 

bi ior i < L 

a*j for j < i < L 

b* for j < L 



(41) 



(42) 



Our strategy is to first perform the Gaussian integration over the 6j's in evaluating 
< (det Ml^i{A))^ >, keeping in mind that we need the terms of order h. This method of 



integrating out a row and a column has also been used in random matrix theory |27 . 

We briefly outline the procedure. Write M^^+i = M2,+i(6 = 0) +-B where B is the matrix 
extracted from (^) upon keeping only the entries which depend on the fe's and 6*'s. Expand 
(det Mi+i(A))^ in powers of B and then perform the Gaussian average over the 6's and 6*'s, 
using < bib* >= jj6ijVj^L+i- After some arithmetic, we obtain 

Z{1,L + 1) = Z{1,L) 

+ E V,,L^i < (det Mf[{^M-l^,)MEl,,^, - j^{^MEi,^r.+2)M-J^,] >o (43) 

We have suppressed the subscript L + 1 on the matrix M on the right hand side. It is 
understood that this expression is to be evaluated at h = 0. Noting that the matrix ^ is 
particularly simple and that {M~^) 1+2,1+2 = 1, we find that 



Z(l, L + 1) = Z(l, L)+Y: W < Mll,,_,,M-l - -j^Mzl.^.Mrj^, > 



(44) 
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Using the definition of tlie connected expectation value 

< AB >=< A >< B > + < AB >c 

we note, as is well-known, that the connected part is of order ( [^) and we can thus 

write 

Z(l, L + 1) = Z(l, L) + X: V,,L+i < MEl,,+, X M-l > (45) 

i=i 

+ Y: V,,l+i < M^l,^^^,Mrl >c (46) 
i=i 

E y^,L+i < MEl,,,M.j^, >c (47) 
Recalling (|28| ) we recognize that the quantities < M£|^j_,_]^ > and < Mj^^ > appearing 



in the second term on the right hand side of (|45|) are nothing but Z{j + 1, L + 1) and Z{l,j) 



respectively. Thus, if we keep only the first two terms on the right hand side of (^5|) we 
obtain the closed recursion relation 

L + 1) ^ Z(l, L) + f: V,,L+iZ{l,j)Zij + 1, L + 1) (48) 

i=i 

This is precisely the recursion relation in the Hartree approximation (^) mentioned 
earlier. 

As announced in the introduction, the formulation given here offers a systematic way 
to go beyond the Hartree approximation. We expect the third and fourth term on the 
right hand side of (|45|), when evaluated to leading order in to give the corrections of 
order It is intriguing then that the superficially similar objects < M]^^^^-j^^Mj^l >c 
and < M^^^^^^Mj'J^i >c must be of order and order respectively. We note however 
that a "backward-propagating object" which we define as Mjj^ with k > j makes its first 
appearance in < M£^^ iM~j_^_i >c- All other terms in ( ^5D involve only forward-propagating 
objects. 

We can of course calculate (^) explicitly for small L in order to check our formulation 
and the order of the various terms in ^ . The result for L = 5 is shown graphically in fig. 6. 



Fig. 6: A few graphs corresponding to the l/N"^ term. 



A. Recursion Relation 

While the recursion relation ( ^5|) has an appealing structure, we are not able to evaluate 
the two objects < M^^^j^-^^M'^ >c and < M£^-^^^^M~j^i >c and express them in a simple 
form. Neither should we be able to do that. Our experience in field theory, for example the 
Dyson-Schwinger equation in quantum electrodynamics, indicates that recursion relations 
generically do not close: new objects appear in the right hand side. There is no reason why 
< M£l^ .j_^_^M~i >c should be expressible in terms of < Mljj} >. New objects, corresponding 
to vertex functions in field theory, must appear. 

Fortunately, we can inspect the set of Feynman diagrams to obtain a recursion relation for 
Z{i,j). We propose the following recursion relation. Given Z{i,j) for all i and j satisfying 
j — i < L — 1, we obtain Z{i,j) for all i and j satisfying j — i < L as follows. 

First, define Z^^\i,j) as the one-particle irreducible (IPI) part of Z{i,j), that is the 
sum of all those diagrams in Z{i,j) that do not fall apart into two separate pieces when a 
quark propagator is cut. Some examples are shown in fig. 7a. 




Fig. 7a: A few one particle irreducible graphs. 



In fig. 7b, we show a different representation of the third graph of fig. 7a 
The concept of, and the necessity of introducing, one-particle irreducibility is of course 
the same here as in field theory such as quantum electrodynamics. 
Second, define the vertex function for n > j > m by 

^In = - E V,,^]{Z'^\m,n) - 1) (49) 

kjtj jk 
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Fig. 7b: The third graph of fig. 7a 

Using the language of quantum chromodynamics, this equation is actually easy to describe in 
words. The vertex function describes a quark propagating from m to n and interacting 
with a gluon at site j. The operator [1 — Y^k^^j ^fc^^] simply insures that there is not 
already a gluon attached to the site j. See fig. 7a. The relation between and Z^^^{m, n) 
has the same form as the Ward identity in quantum electrodynamics. 

Since we want to calculate Z to order l/N'^, according to eq.(^9D we need to keep only the 
IPI diagrams of order 1 in Z^^^{m, n) and in Ti^^. These are just the IPI Hartree diagrams, 
i.e. the sum of all rainbow diagrams. Note that Ti^^ is simply related to the Bethe-Salpeter 
scattering amphtude A^^^mn ( P^ ) by 



mn 



k,l 



(50) 



where the summation over /c, / is restricted to m<k<j<l<n. This relation is 
represented graphically in fig. 8a 



Fig. 8a: Graphical representation of the relation between the vertex function and the 
scattering amplitude A^i^rnn- 

Third, we calculate for k + 1 > i 
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k 

Z{i, k + l) = Z{i, k) + J2 Vj,k+iZ{t,j - l)Z{j + 1, k) 

k 

+ E E m - l)Tl^Z{n + 1, /c) (51) 

j=\ m,n 

with the boundary condition Z{i,i) = 1, Z{i,i — 1) = 1, and Z(1,0) = 1. The meaning of 
this equation is expressed graphically in fig. 8b. 



i+1 k i i+1 k 



i+1 



i i+1 

r2 



Fig. 8b: Graphical representation of the recursion equation to order 1/N . The black triangle 
represents the vertex function r^„. 

We have checked this equation explicitly for L up to 6. A graph generated to order 
is displayed in fig. 9. 



Fig. 9: A contribution of order 1/A^^ generated by the modified recursion relation 

These equations are adequate to order 1/A^^, but not to order 
We summarize the steps of the new recursion relation. 

• Assume the partition functions Z{i,j) are known for all pairs such that i — j < I. 

• Calculate all the one-particle irreducible functions Z^^^{m, n) in the Hartree approx- 
imation. This is just the sum of all rainbow diagrams between m and n, with an 
interaction Vmn joining m and n. If no gluon is connected to the site i, then this 
contributes to F'^ . 
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• Insert this function F and all the functions Z{i,j) in (^TJ) to calculate the partition 
functions with one more base. 

• Iterate the process. 

This procedure allows obviously to evaluate the free energy of a given RNA sequence 
recursively. Regard Z(m, n) as the element in the mth row and nth column of a matrix. We 
impose the boundary conditions Z{j,j) = 1 and Z{j,j — 1) = 1. We then use (|5l| ) to expand 
the matrix to ever larger size, moving "towards the northeast." In numerical evaluation, we 
no longer need to know the origin of the parameter l/N"^: we can simply take = 1. The 
factor has just allowed us to extract the most relevant diagrams beyond the Hartree 

theory. 

To find the "ground state configuration" for a given RNA sequence we simply write (^Tj) 

for 

Z(l, L) = Z(l, L - 1) + Xi V,l{Z{1,j - l)Z{j + 1, L - 1) 

i=i 

+ m - l)Vl^Z{n + 1, L - 1)} (52) 

and evaluate it "backwards". We replace Z{1, L) by the largest term on the right hand side 

Z(l, L) ~ max, L - 1), V^lZ{1, j - l)Z{j + 1, L - 1), 

VjlZ{1, m - l)ri„Z(n + 1, L - 1)}} (53) 

The largest term, in turn, comprises Z of lower order, for which we can apply this bac- 
tracking algorithm. Repeating this process, we obviously obtain the dominant configuration. 

In fact, since the lowest energy configuration obtained in this way is not necessarily 
feasible in real space, a better strategy could be to use the backtracking algorithm to generate 
a set of lowest energy configurations, and check which one can be realized with real molecules 
with their rigidity and chemical constraints. For example, configurations such as the one 
of fig. 10 with crossing "gluon" hues should be discarded, as they are forbidden by steric 
constraints. 
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Fig. 10: A contribution of order sterically forbidden. 
VII. CONCLUSION 



We have shown that the RNA folding problem can be mapped onto a large N matrix field 
theory. The dominant term {N independent) is the usual Hartree theory, which is known to 
generate secondary structures. The 1/N correction term vanishes, and the correction of or- 
der l/N"^ generates the pseudo-knots or tertiary structure. The standard Hartree recursion 
relation is then replaced by a corrected recursion relation. The resulting three dimensional 
structure can be obtained by backtracking the recursion relation. The spatial feasibility of 
this tertiary structure (which remains to be checked) is due to the fact that the expan- 
sion classifies diagrams in terms of their topology. What remains to be done is to include 
the loop entropy, stacking energies and a numerical study of the recursion equations to order 
together with the backtracking algorithm. This will be presented in a forthcoming 

paper. 
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APPENDIX A 



In this appendix, we show that the correction to the free energy vanishes identically. 



We first note that eq. (22) can be recast in the form 



where 



M{Au.,A) 



1 

-11 + ai2 ai3 

—1 l + «23 



A 
aiL 
a2L 



v 



''L-2L "L-IL 



1 l + ai_ii 

1 1 



(54) 



The steepest descent method applied to (Q) yields 

A = 

whereas the definition for all the other Aui and A^, are identical to those of section IV and 
V. The correction of order 1/A^ to eq.(^) can be easily recast in the form 



Z^^^ = J daii,daii'da*da exp ^— tra^^/a*;, — traa* — -tr (^Mq 



X |A*(ltr(il/Vic)' + ^(tr(Mo-V' 



, \ --a*tT Mn^c 



2 1 
— ( 
3 



(55) 



with the notations of section IV and V and Mq denotes the matrix M evaluated at the 
stationary point. It is clear that a* occurs only in the term traa* of the first line and in the 
term a*tr (^Mq^cJ of the second line of (^5]). This second term can be integrated by part in 
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favor of a, to remove all dependence on a* except in the exponent. Once it is clear that a* 
occurs only in the exponent, we recognize the holomorphic representation of the 5-function. 
Thus, the integration over a* implies that we can set a = everywhere. This being done, we 
see that all the terms like al2, • • • , ^il and 02^, . . . , a2_i l present only in the exponent 
(in the tiau/a^, term). Therefore, we can integrate them out, and the result is again a set 
of (5-function which impose 

ai2 = . . . = aiL = = ■ ■ ■ = ^l-i.l = 

This procedure can be carried out recursively to "eat up" all the a* and a , leading to 
the vanishing of the correction (|55D . 



APPENDIX B 



A. Recursion 



An alternative strategy to evaluating Z recursively is by integrating out ^Pl+i in the 
expression for Z{1,L + 1). For notational simplicity, let us define //^ = M = 

(fL+i and T = Y.i=iiy~^)L+i,i^i- Evidently, we have to do two Gaussian integrals over M: 

J rfMe-^*'^(^^^+^^^') = C{^^, N)e'^''^" (56) 



and 



/ dMe-^*^(™+^^')M = -^C(/i, iV)e+^*^^V (57) 



where (^) is obtained by differentiating ( ^6]) with respect to the matrix T. Thus, af 



ter integrating out ^Pl+i in Z{1,L + 1), we find that the "action" Z]jj(^^^)jjtr(v^jV^i) 
has been replaced by the effective action J2ij(y~^)ij'^T^{'^i'Pj) where {V~^)ij = (y^^)ij — 
(y~^)i,L+i (Y-i) ^ {V^^)L+i.j- It is easy to see that V is the L by L matrix obtained by 

\ ^ ) L-\-l,L-\-l 

crossing out the last row and column of the L + 1 by L + 1 matrix V, as we might have 
expected. Putting these steps together we obtain 
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Z(l, L + 1) = Z(l, L) - -— j:{V-')L+i,i < ^tr(nii(l + > (58) 

where (n^]^(l + yjj)) is ordered as before. The expectation value of a matrix O constructed 
out of the yjj's is defined by 

<0>= J nfcdy.,e-'^^^-(''")-*^('^^'^^^0 (59) 

In other words, Z{1, L) =< -^trnj(l + ipi) >. 

To evaluate < ;^tr(n^^(l + ipi))'^i > we follow the standard procedure of replacing 

<^,e-^^E.,(^-^)».f('P.'P.) ^ _^^L^^^^_i_e-A^iE.,(V'-^)..tr(^.^,)_ Integrating by parts, we 

finally obtain 

Z{1, L + l) = Z(l, L) + X: < ^tr(ntli(l + ^,))ltr(n^^,+,(l + ^,)) > (60) 

k=l 

In other words, in ( ^Sf ) we have Wick contracted y?; with (/^^ in the ordered product 
Ilf^i{l + (fi). Evidently, ;^tr(n^!rj^(l + (fi)) is to be interpreted as 1 for = 1. Similarly, 
-^tr(nj"^^_^]^(l + ipj)) is to be interpreted as 1 for /c = L. 

In principle, we can extract what we need from this recursion relation (|60D. We emphasize 
that (|60D is derived without taking the large N limit and holds for finite N, including = 1. 

B. Large N Expansion 

We can now perform a large expansion, giving us a systematic way of evaluating Z 
to any desired order of l/N"^. In the language of quantum chromodynamics, quantities in 
which the indices of the matrices ipj are summed over such as -^tr(n^~^^(l + cpi)) are known 



as color singlet operators. It is well known [11] that given two color singlet operators A and 



B, the expectation value factorizes to leading order in large 

< AB >=< Ax B > + < AB >c (61) 



with the connected correlation function < AB >c suppressed by a factor of 0(1/A^^) relative 
to < y4 >< i? >. It is easy to see the validity of ( |6l]) by drawing a few diagrams such as 
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those in fig. 8b. Connected correlation functions < AB >c have been intensively studied 
||T3| in the matrix theory literature and a good deal is known about them. Thus, we can 
write in ( |60D 

< ^tr(nt-,i(i + ^.))^tr(nj^^,^,(i + ^,)) > = 
< ^tr(nt-i'(l + ^,)) X ltr(nj^^,^i(i + v^,)) > 
+ < ^tr(ntli(l + y,,))ltr(n^=,^i(l + ^,)) >c (62) 

We immediately recognize that first term in ( p^ ) as Z{1, k— l)Z{k + l, L). By definition, the 
connected correlation function Zc{l, k — l;k + 1, L) =< jjtT(n.'lzl{l + V9j))-^tr(n^^^^_j^(l + 
ifj)) >c is evaluated by contracting a matrix ipi from one of the traces to a matrix (pj from 
the other trace. Thus the exact recursion relation is given by 

Z{1, L + 1) = Z{1, L) + j2 VL+i,kZ{l, k - l)Z{k + 1, L) 

k=l 

L 

+ J2VL+i,kZcil,k-l;k + l,L) (63) 

k=l 

This gives an alternative representation of (P5|). Evidently, 

Zc(l, k-l;k + l,L)=< MEU,,+,M^i >c < M^l,^,M,i^, >c (64) 

In principle, we can take the exact recursion relation (^) and evaluate the two terms on 
the right hand side to any desired order in and thus generate, given an RNA sequence, 
secondary structure, tertiary structure, ad infinitum. 
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